Topic 1 Filesystem (2)

Leedehai
Friday, April 07, 2017
Monday, April 10, 2017

2.4 Filesystems

2.4.1 An overview

We assume the storage device is a local hard drive here, for simplicity.

Inode.

We will use Unix v6 filesystem (1975) as an example to elaborate how filesystems work.

2.4.2 Layering for the Unix filesystem

Layering: As a form of modularity, layering is a basic idea to design large-scale systems. Apart from filesystems, the implementation of compilers uses the same strategy. That is, break down the entire task into many layers (or steps) and tackle them individually. The design of internet layers also uses the same idea.
Pictorially, though layers are stacked vertically and steps are aligned horizontally to form a pipeline, the ideas are essentially the same.

USER ┌────────────────────────┐ ▲ │ Symbolic Link Layer │ pathname ▶ pathname │ ├────────────────────────┤ user │ │Absolute Path Name Layer│ absolute pathname ▶ inode# friendly│ ├────────────────────────┤ ▼ │ Path Name Layer │ pathname ▶ inode# ├────────────────────────┤ interface │ File Name Layer │ file name ▶ inode# ├────────────────────────┤ ▲ │ Inode Number Layer │ inode# ▶ inode machine │ ├────────────────────────┤ friendly│ │ File Layer │ index# ▶ block# (blocks form a file) │ ├────────────────────────┤ ▼ │ Block Layer │ block# ▶ block data └────────────────────────┘ MACHINE

2.4.3 The layers explained

2.4.3.1 The block layer

2.4.3.2 The file layer (KOB)

struct inode { /* one instance for a file */ int block_numbers[N]; /* N = 8 */ int fileSize_in_bytes; int type; /* regular file, directory file, or symbolic link file */ ... /* other metadata */ }
◀────────────── Inode table ─────────────────▶ ◀─ block ──▶ ┌─────────┬──┬──┬────┬───────────────────────┐ │ │ │██│ │ │ └─────────┴──┴──┴────┴───────────────────────┘ ┌────────────┴──────────────────┐ ◇ ◇ ┌───────────────────────────────┐ │Inode │ │ │ │ type: regular file │ Note: an inode just occupies │ permission: rwxr-x--x │ a sliver of a block. │ owner ID: Leedehai │ │ reference count: 1 │ (You can examine inode's │ last modification time: .. │ content using shell │ ... │ command "stat") │ index=0 1 ... 7 │ │ block numbers┌─┬─┬─┬─┬─┬─┬─┬─┐│ │ └─┴─┴─┴─┴─┴─┴─┴─┘│ └───────────────────────────────┘
┌─┬─┬───┐ │ │█│ │ Inode (inside an └─┴┬┴───┘ inode block) ┌────────┴┬───────────────┐ ▼ 1 ▼ 2 ▼ 8 ┏━━━━━━━┓ ┏━━━━━━━┓ ┏━━━━━━━┓ ┃payload┃ ┃payload┃ ... ┃payload┃ ┃ block ┃ ┃ block ┃ ┃ block ┃ ┗━━━━━━━┛ ┗━━━━━━━┛ ┗━━━━━━━┛ Note: an inode just occupies a sliver of a block.
┌─┬─┬───┐ │ │█│ │ Inode (inside an └─┴┬┴───┘ inode block) ┌────────┴┬───────────────┬─────────────┐ ▼ 1 ▼ 2 ▼ 7 ▼ 8 ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ │ indi. │ │ indi. │ ... │ indi. │ │d.indi.│ │ block │ │ block │ │ block │ │ block │ └───┬───┘ └───────┘ └───┬───┘ └───┬───┘ ┌────────┴────┐ ... ──┴───┐ ├────────────┐ ▼ 1...256 ▼ ▼ 256 ▼ 1...256 ▼ ┏━━━━━━━┓ ┏━━━━━━━┓ ┏━━━━━━━┓ ┌───────┐ ┌───────┐ ┃payload┃ ... ┃payload┃ ┃payload┃ │ indi. │ │ indi. │ ┃ block ┃ ┃ block ┃ ... ... ... ┃ block ┃ │ block │ .. │ block │ ┗━━━━━━━┛ ┗━━━━━━━┛ ┗━━━━━━━┛ └───┬───┘ └───┬───┘ 1 256 ┌─────────────┤ ... ─┴┐ ▼ 1...256 ▼ ▼ 256 ┏━━━━━━━┓ ┏━━━━━━━┓ ┏━━━━━━━┓ ┃payload┃ ... ┃payload┃ ... ┃payload┃ ┃ block ┃ ┃ block ┃ ┃ block ┃ ┗━━━━━━━┛ ┗━━━━━━━┛ ┗━━━━━━━┛ Note: - an inode just occupies a sliver of a block. - suppose a (doubly) idirect block can store 256 block numbers.

2.4.4.3 The inode number layer

2.4.4.4 The file name layer (the machine-user interface of the filesystem)

CS110 (directory) ├─ Syllabus.txt (regular file) └─ Homeworks (directory) ├─ HW1 (directory) └─ HowToSubmit.txt (regular file) inode 1001 (CS110's inode) block 9105 ─┌───────────────┬ ─ ┌──────────────────────────────────┐ │type: dir. │ │ │ │permission:.. │ │"Syllabus.txt": inode 2007 │─ ┐ │ref cnt:.. │─────────▶│"Homeworks": inode 2011 │ │block#: 9105 │ │ │ │ ─└───────────────┴ ─ └──────────────────────────────────┘ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ block 9397 inode 2007 ┌──────────────────────────────────┐ │ (Syllabus.txt's inode) │ │ ─┌───────────────┬ ─ ┌─▶│Content.. │ │ │type: reg. │ │ │ │ │permission:.. │ │ └──────────────────────────────────┘ ├ ─▷ │ref cnt:.. │───────┤ block 9395 │block#: 9397, │ │ ┌──────────────────────────────────┐ │ │ 9395 │ │ │ │ ─└───────────────┴ ─ └─▶│Content.. │ │ │ │ inode 2011 └──────────────────────────────────┘ │ (Homework's inode) block 9301 ─┌───────────────┬ ─ ┌──────────────────────────────────┐ │ │type: dir. │ │ │ │permission:.. │ │"HW1": inode 1015 │ └ ─▷ │ref cnt:.. │─────────▶│"HowToSubmit.txt": inode 1070 │ │block#: 9301 │ │ │ ─└───────────────┴ ─ └──────────────────────────────────┘

2.4.4.5 The path name layer & the absolute path name layer.

Similar heirarchical mechanism is utilized in internet's domain name resolution.

2.5 More words on system calls

┌────────────────────────────────────────┐ 0xffffffff │ │ : kernel : │ │ 0xc0000000 - - - ╠════════════════════════════════════════╣ - - - │ │ 0xbfffffff : : │ │ ▲ ├────────────────────────────────────────┤ │ │ user main()'s stack frame │ │ │ stack ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ │ foo()'s stack frame │ o │ │─ ─ ─ ─ ─ ─ ─ ─ ─ ─│─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │<- %esp n │ │ ▼ │ (stack pointer e │ │─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ register) │ │ memory mapped region │ u │ │ for shared libraries │ s │ │─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ e │ │ ▲ │ r │ │─ ─ ─ ─ ─ ─ ─ ─ ─ ─│─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │<- brk │ │ runtime heap │ (variable p │ │─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ maintained r │ │ bss │ by kernel) o │ │ (unitialized static data) │ c │ │─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ e │ │ rodata (read-only data) │ ┐ s │ │─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ s │ │ data │ ├ a.out │ │─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ │ │ program text │ ┘ │ │ (binary instructions) │ <- %rip ▼ ├────────────────────────────────────────┤ (instruction pointer │ │ register points to : : the next instruct.) │ │ └────────────────────────────────────────┘ 0x00000000 Virtual Memory Space
EOF